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Abstract 

In natural-language discourse, related events 
tend to appear near each other to describe a 
larger scenario. Such structures can be formal- 
ized by the notion of a frame (a.k.a. template), 
which comprises a set of related events and 
prototypical participants and event transitions. 
Identifying frames is a prerequisite for infor- 
mation extraction and natural language gen- 
eration, and is usually done manually. Meth- 
ods for inducing frames have been proposed 
recently, but they typically use ad hoc proce- 
dures and are difficult to diagnose or extend. 
In this paper, we propose the first probabilis- 
tic approach to frame induction, which incor- 
porates frames, events, participants as latent 
topics and learns those frame and event tran- 
sitions that best explain the text. The number 
of frames is inferred by a novel application of 
a split-merge method from syntactic parsing. 
In end-to-end evaluations from text to induced 
frames and extracted facts, our method pro- 
duced state-of-the-art results while substan- 
tially reducing engineering effort. 

1 Introduction 

Events with causal or temporal relations tend to oc- 
cur near each other in text. For example, a bomb- 
ing scenario in an article on terrorism might begin 
with a DETONATION event, in which terrorists set 
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off a bomb. Then, a DAMAGE event might ensue 
to describe the resulting destruction and any casual- 
ties, followed by an INVESTIGATION event cov- 
ering subsequent police investigations. Afterwards, 
the bombing scenario may transition into a criminal- 
processing scenario, which begins with police catch- 
ing the terrorists, and proceeds to a trial, sentenc- 
ing, etc. A common set of participants serves as 
the event arguments; e.g., the agent (or subject) of 
DETONATION is often the same as the theme (or ob- 
ject) of INVESTIGATION and corresponds to the 
PERPETRATOR. 

Such structures can be formally captured by the 
notion of a frame (a.k.a. template), which consists 
of a set of events with prototypical transitions, as 
well as a set of slots representing the common par- 
ticipants. Identifying frames is an explicit or im- 
plicit prerequisite for many NLP tasks. Informa- 
tion extraction, for example, stipulates the types of 
events and slots that are extracted for a frame or 
template. Online applications such as dialogue sys- 
tems and personal-assistant applications also model 
users' goals and subgoals using frame-like represen- 
tations, and in natural-language generation, frames 
are often used to represent content to be expressed 
as well as to support surface realization. 

Until recently, frames and related representations 
have been manually constructed, which has limited 
their applicability to a relatively small number of do- 
mains and a few slots within a domain. Furthermore, 
additional manual effort is needed after the frames 
are defined in order to extract frame components 
from text (e.g., in annotating examples and design- 
ing features to train a supervised learning model). 



This paradigm makes it hard to generalize across 
tasks and might suffer from annotator bias. 

Recently, there has been increasing interest in au- 
tomatically inducing frames from text. A notable 



example is Chambers and Jurafsky (201 1 1, which 



first clusters related verbs to form frames, and then 
clusters the verbs' syntactic arguments to identify 



slots. While [Chambers and Jurafsky (2011 1 repre- 
sents a major step forward in frame induction, it is 
also limited in several aspects. The clustering used 
ad hoc steps and customized similarity metrics, as 
well as an additional retrieval step from a large ex- 
ternal text corpus for slot generation. This makes it 
hard to replicate their approach or adapt it to new 
domains. Lacking a coherent model, it is also diffi- 
cult to incorporate additional linguistic insights and 
prior knowledge. 

In this paper, we present ProFinder (PROba- 
bilistic Frame INDucER), which is the first proba- 
bilistic approach for frame induction. ProFinder 
defines a joint distribution over the words in a 
document and their frame assignments by model- 
ing frame and event transition, correlations among 
events and slots, and their surface realizations. 
Given a set of documents, ProFinder outputs a 
set of induced frames with learned parameters, as 
well as the most probable frame assignments that 
can be used for event and entity extraction. The 
numbers of events and slots are dynamically deter- 
mined by a novel application of the split-merge ap- 
proach from syntactic parsing (Petrov et al., 2006). 
In end-to-end evaluations from text to entity ex- 
traction using the standard MUC and TAC datasets, 
ProFinder achieved state-of-the-art results while sig- 
nificantly reducing engineering effort and requiring 
no external data. 

2 Related Work 



In information extraction and other semantic pro- 
cessing tasks, the dominant paradigm requires two 
stages of manual effort. First, the target representa- 
tion is defined manually by domain experts. Then, 
manual effort is required to construct an extractor or 
annotate examples to train a machine-learning sys- 
tem. Recently, there has been a burgeoning body 
of work in alleviating such manual effort. For exam- 
ple, a popular approach to reduce annotation effort is 



bootstrapping from seed examples (Patwardhan and 
Riloff, 20071 |Huang and Riloff, 2012] ). However, 



this still requires prespecified frames or templates, 
and selecting seed words is often a challenging task 



due to semantic drift ( |Curran et al., 2007) . Open 
IE (IBanko and Etzioni, 2008b reduces the manual 



effort to designing a few domain-independent rela- 
tion patterns, which can then be applied to extract 
relational triples from text. While extremely scal- 
able, this approach can only extract atomic factoids 
within a sentence, and the resulting triples are noisy, 
non-cannonicalized text fragments. 

More relevant to our approach is the recent work 
in unsupervised semantic induction, such as un- 



supervised semantic parsing (Poon and Domingos, 
2009[ ), unsupervised semantical role labeling (Swier 



and Stevenson, 2004] ) and induction ( [Lang and Lap 



ata, 201 1 e.g.), and slot induction from web search 



logs (Cheung and Li, 2012 1. As in ProFinder, 
they also model distributional contexts for slot or 
role induction. However, these approaches focus on 
semantics in independent sentences, and do not cap- 
ture discourse-level dependencies. 

The modeling component for frame and event 
transitions in ProFinder is similar to a sequen- 



tial topic model (Gruber et al., 2007), and is in- 
spired by the successful applications of such topic 
models in summarization (Barzilay and Lee, 2004 



Daume III and Marcu, 2006; Haghighi and Vander- 



wend e, 2009] inter alia). There are, however, two 
main differences. First, ProFinder contains not 
a single sequential topic model, but two (for frames 
and events, respectively). In addition, it also mod- 
els the interdependencies among events, slots, and 
surface text, which is analogous to the USP model 



(Poon and Domingos, 2009). ProFinder can thus 
be viewed as a novel combination of state-of-the- 
art models in unsupervised semantics and discourse 
modeling. 

In terms of aim and capability, ProFinder 



is most similar to Chambers and Jurafsky (2011 



which culminated from a series of work for iden- 
tifying correlated events and arguments in narrative 
( |Chambers and Jurafsky, 2008] [Chambers and Ju- 



rafsky, 2009 1. By adopting a probabilistic approach, 



ProFinder has a sound theoretical underpinning, 
and is easy to modify or extend. For example, in 
Section 3, we show how ProFinder can easily be 



augmented with additional linguistically-motivated 
features. Likewise, ProFinder can easily be used 
as a semi-supervised system if some slot designa- 
tions and labeled examples are available. 

The idea of representing and capturing stereotyp- 
ical knowledge has a long history in artificial in- 
telligence and psychology, and has assumed vari- 
ous names such as frames (Min sky, 1974[ ), schemata 
( |Rumelhart, 1975| ), and scripts ( [Schank and Abel- 



son, 1977 1. In the linguistics and computational 



linguistics communities, frame semantics (Fillmore, 



|1982| ) uses frames as the central representation of 
word meaning, culminating in the development of 
FrameNet (Ba ker et al., 1998 1, which contains over 
1000 manually annotated frames. A similarly rich 
lexical resource is the MindNet project ( jRichard-] 
son et al., 1998). Our notion of frame is related 



to these representations, but there are also subtle 
differences. For example, Minsky's frame empha- 
sizes inheritance, which we do not model in this pa- 
per. (It should be a straightforward extension: us- 
ing the split-and-merge approach, ProFinder already 
produces a hierarchy of events and slots in learning, 
although currently, it simply discards the intermedi- 
ate levels.) As in semantic role labeling, FrameNet 
focuses on semantic roles and does not model event 
or frame transitions, so the scope of its frames is of- 
ten no more than an event in our model. Perhaps 
the most similar to our frame is Roger Schank's 
scripts, which capture prototypical events and par- 
ticipants in a scenario such as restaurant dining. In 
their approach, however, scripts are manually de- 
fined, making it hard to generalize. In this regard, 
our work may be viewed as an attempt to revive a 
long tradition in Al and linguistics, by leveraging 
the recent advances in computational power, NLP, 
and machine learning. 

3 Probabilistic Frame Induction 

In this section, we present ProFinder, a proba- 
bilistic model for frame induction. Let J 7 be a set of 
frames, where each frame F = (Ep, Sf) comprises 
a unique set of events Ep and slots Sf. Given a doc- 
ument D and a word w in D, Z w = (f, e) represents 
an assignment of w to frame / € F and frame el- 
ement e G Ef U Sf. At the heart of ProFinder 
is a generative model Pg(D, Z) that defines a joint 



distribution over document D and the frame assign- 
ment to its words Z. Given a set of documents V, 
frame induction in ProFinder amounts to determin- 
ing the number of frames, events and slots, as well 
as learning the parameters 9 by summing out the la- 
tent assignments Z to maximize the likelihood of the 
document set 

II Pe(D). 

Dex> 

The induced frames identify the key event structures 
in the document set. Additionally, ProFinder 
can also conduct event and entity extraction by 
computing the most probable frame assignment Z. 
In the remainder of the section, we first present 
the base model for ProFinder. We then intro- 
duce several linguistically motivated refinements, 
and efficient algorithms for learning and inference 
in ProFinder. 

3.1 Base Model 

The probabilistic formulation of ProFinder 
makes it extremely flexible for incorporating lin- 
guistic intuition and prior knowledge. In this paper, 
we design our ProFinder model to capture three 
types of dependencies. 

Frame transitions between clauses A sentence 
contains one or more clauses, each of which is a 
minimal unit expressing a proposition. A clause is 
unlikely to straddle across different frames, so we 
stipulate that the words in a clause be assigned to 
the same frame. On the other hand, frame transitions 
can happen between clauses, and we adopt the com- 
mon Markov assumption that the frame of a clause 
only depends on the clause immediately to its left. 
Here, sentences are ordered sequentially as they ap- 
pear in the documents. Clauses are automatically 
extracted from the dependency parse and further de- 
composed into an event head and its syntactic argu- 
ments; see the experiment section for details. 

Event transitions within a frame Events tend to 
transition into related events in the same frame, as 
determined by their causal or temporal relations. 
Each clause is assigned an event compatible with 
its frame assignment (i.e., the event is in the given 
frame). As for frame transitions, we assume that the 
event assignment of a clause depends only on the 
event of the previous clause. 



Emission of event heads and slot words Similar 
to topics in topic models, each event determines a 
multinomial from which the event head is generated. 
E.g., a detonation event might use verbs such as det- 
onate, set off or nouns such as denotation, bombing 
as its event head. Additionally, as in USP ( |Poon and 



Domingos, 2009 1, an event also contains a multino- 
mial of slots for each of its argument type^j] E.g., 
the agent argument of a detonation event is generally 
the PERPETRATOR slot of the BOMBING frame. Fi- 
nally, each slot has its own multinomials for gener- 
ating the argument head and dependency label, re- 
gardless of the event. 

Formally, let D be a document and C\ , • • ■ , C\ be 
its clauses, the ProFinder model is defined by 

P e (D,Z) = P F 

-INIT 

-jRkn{Fi+i\Fi) 

i 

X ft- INIT (-El | -Fl) 

x Y\ ft-TRAN (Ei+l | Ei , Fi + i , F{ ) 



X 



i 

Y[ p swi (Si,j\Ei t j,Ai j ) 

Y\ -fA-HEAD(Ojj'|5'i,j) 

Y]_Pk-mp(depij\Sij) 



Here, Fi,Ei denote the frame and event assign- 
ment to clause Cj, respectively, and ei denotes the 
event head. For the j-th argument of clause i, 
Sij denotes the slot assignment, A{ j the argument 
type, dij the head word, and depij the dependency 
from the event head. P E ^ TRM (E i+1 \E i , F i+1 , Ft) = 

ft-INIT^i-KLFi+l) if-^i+1 / Ei. 

Essentially, ProFinder combines a frame 
HMM with an event HMM, where the first mod- 
els frame transition and emits events, and the second 
models event transition within a frame and emits ar- 
gument slots. 



'USP generates the argument types along with events from 
clustering. For simplicity, in ProFinder we simply classify 
a syntactic argument into subject, object, and prepositional ob- 
ject, according to its Stanford dependency to the event head. 



3.2 Model refinements 

The base model captures the main dependencies in 
event narrative, but it can be easily extended to lever- 
age additional linguistic intuition. ProFinder in- 
corporates three such refinements. 

Background frame Event narratives often con- 
tain interjections of general content common to all 
frames. For example, in newswire articles, ATTRI- 
BUTION is commonplace to describe who said or 
reported a particular quote or fact. To avoid con- 
taminating frames with generic content, we intro- 
duce a background frame with its own events, slots, 
and emission distributions, and a binary switch vari- 
able Bi G {BKG, CNT} that determines whether 
clause i is generated from the actual content frame 
Fi (CNT) or background (BKG). We also stipu- 
late that if background is chosen, the nominal frame 
stays the same as the previous clause. 

Stickiness in frame and event transitions Prior 
work has demonstrated that promoting topic coher- 
ence in natural-language discourse helps discourse 
modeling (Barzilay and Lee, 2004[). We extend 



ProFinder to leverage this intuition by incorpo- 



rating a "stickiness" prior (Haghighi and Vander- 



wende, 20091 to encourage neighboring clauses to 
stay in the same frame. Specifically, along with in- 
troducing the background frame, the frame transi- 
tion component now becomes 



PF-iMs(Ei+i\Fi, Bi+i) — 

'l{F i+1 =Fi), 
pl(F i+1 = Fi)+ 
{(l-P)P F -TMEi+i\Fi), 



(1) 

if B l+l = BKG 
if Bi+i = CNT 



where j3 is the stickiness parameter, and the event 
transition component correspondingly becomes 

PE-TRhN(Ei + i\Ei, Fi + i, Fi, Bi + i) = (2) 
'l(Ei +1 = Ei), if B i+1 = BKG 
P E -tran(^+i|^), if B l+1 = CNT, Fi = F i+1 
Pe-init(^+i), if B i+1 = CNT, Fi ^ F i+1 

Argument dependencies as caseframes As no- 
ticed in previous work such as Chambers and Ju- 



rafsky (2011 1, the combination of an event head 



Background f D 
Frame ( F 1 




Figure 1 : Graphical representation of our model. Hyper- 
parameters, the stickiness factor, and the frame and event 
initial and transition distributions are not shown for clar- 
ity. 

and a dependency relation often gives a strong sig- 
nal of the slot that is indicated. For example, 
bomb > nsubj often indicates a PERPETRATOR. 
Thus, rather than simply emitting the dependency 
from the event head to an event argument depij, our 
model instead emits the pair of event head and de- 
pendency relation, which we call a caseframe fol- 



lowing Bean and Riloff (2004). 



3.3 Full generative story 

To summarize, the distributions that are learned by 
our model are the default distributions Pbkg(B), 
P F _ INIT (F), -Pe-init(-E'), the transition distri- 
butions PF-TRAli(Fi+l\Fi), pE-JRku{Ei+l\Ei), 

and the emission distributions P SL0T (S\E, A, B), 
PE-nEkD(e\E,B), -Pa-headOIS), Pk-DEp(dep\S). 
We used additive smoothing with uniform Dirich- 
let priors for all the multinomials. The overall 
generative story of our model is as follows: 

1. Draw a Bernoulli distribution for Pbkg(B) 

2. Draw the frame, event, and slot distributions 

3. Draw an event head emission distribution 
Pe—head{c\E, B) for each frame including the 
background frame 

4. Draw event argument lemma and caseframe 
emission distributions for each slot in each 
frame including the background frame 



5. For each clause in each document, generate the 

clause-internal structure. 
The clause-internal structure at clause i is gener- 
ated by the following steps: 

1. Generate whether this clause is background 
{Bi G {CNT, BKG] ~ P BKG (B)) 

2. Generate the frame Fi and event Ei from 
Pf-init(P), Pe-init(-E), or according to 
equations [T] and [2] 

3. Generate the observed event head e, from 
Pe— head ( 6i I Ei ) . 

4. For each event argument: 



S, 



from 



(a) Generate the slot 
Psl t(S\E,A,B). 

(b) Generate the dependency /caseframe emis- 
sion depij ~ -Pa-dep (dep\S) and the 
lemma of the head word of the event ar- 
gument dij ~ Pa-headHS)- 

3.4 Learning and Inference 

Our generative model admits efficient inference by 
dynamic programming. In particular, after collaps- 
ing the latent assignment of frame, event, and back- 
ground into a single hidden variable for each clause, 
the expectation and most probable assignment can 
be computed using standard forward-backward and 
Viterbi algorithms. 

Parameter learning can be done using EM by al- 
ternating the computation of expected counts and the 
maximization of multinomial parameters. In par- 
ticular, ProFinder used incremental EM, which 
has been shown to have better and faster con- 



vergence properties than standard EM (Liang and 



Klein, 2009). 



Determining the optimal number of events and 
slots is challenging. One solution is to adopt non- 
parametric Bayesian methods by incorporating a hi- 
erarchical prior over the parameters (e.g., a Dirich- 
let process). However, this approach can impose 
unrealistic restrictions on the model choice and re- 
sult in intractability which requires sampling or ap- 
proximate inference to overcome. Additionally, EM 
learning can suffer from local optima due to its non- 
convex learning objective, especially when dealing 
with a large number hidden states without a good 
initialization. 

To address these issues, we adopt a novel appli- 
cation of the split-merge method previously used in 



syntactic parsing for inferring refined latent syntac- 



tic categories ( Petrov et al., 2006] ). Specifically, we 
initialize our model such that each frame is associ- 
ated with one event and two slots. Then, after a num- 
ber of iterations of EM, we split each event and slot 
in two along with their probability, and duplicate the 
associated emission distributions. We then add some 
perturbation to break symmetry. After splitting, we 
merge back a proportion of the newly split events 
and slots that result in the least improvement in the 
likelihood of the training data. For more details on 
split-merge, see ( [Petrov et al, 2006| ) 

By adjusting the number of split-merge cycles and 
the merge parameters, our model learns the number 
of events and slots in a dynamical fashion that is tai- 
lored to the data. Moreover, our model starts with 
a small number of frame elements, which reduces 
the number of local optima and make initial learn- 
ing easier. After each split, the subsequent learning 
starts with (a perturbed version of) the previously 
learned parameters, which makes a good initializa- 
tion that is crucial for EM. Finally, it is also compat- 
ible with the hierarchical nature of events and slots. 
For example, slots can first be coarsely split into per- 
sons versus locations, and later refined into subcate- 
gories such as perpetrators and victims. 

4 MUC-4 Entity Extraction Experiments 

We first evaluate our model on a standard entity 
extraction task, using the evaluation settings from 



Chambers and Jurafsky (2011 ) to enable a head-to- 
head comparison. Specifically, we use the MUC-4 
data set ( |muc, 1992) , which contains 1300 training 
and development documents on terrorism in South 
America, with 200 additional documents for testing. 
MUC-4 contains four templates: attack, kidnapping, 
bombing, and arsonj^] All templates share the same 
set of predefined slots, with the evaluation focusing 
on the following four: perpetrator, physical target, 
human target, and instrument. 

For each slot in a MUC template, the system first 
identified an induced slot that best maps to it by F\ 



on the development set. As in Chambers and Juraf- 



sky (2011 ), template is ignored in final evaluation. 
So the system merged the induced slots across all 



Two other templates have negligible counts and are ignored 



templates to calculate the final scores. Correctness 
is determined by matching head words, and slots 
marked as optional in MUC are ignored when com- 
puting recall. All hyper-parameters are tuned on the 
development sej^] 

Document classification The MUC-4 dataset 
contains many documents that contain words related 
to MUC slots (e.g., plane and aviation), but are not 
about terrorism. To reduce precision errors, Cham- 
bers and Jurafsky 's (2011) (henceforth, C&J) first 
filtered irrelevant documents based on the specificity 
of event heads to learned frames. To estimate the 
specificity, they used additional data retrieved from a 
large external corpus. In ProFinder, however, speci- 
ficity can be easily estimated using the probability 
distributions learned during training. In particular, 
we define the probability of an event head in a frame 
j: 



Pf{w) 



E 

E F &F 



Pi 



E-HEAD 



(w\E)/\F\, (3) 



and the probability of a frame given an event head: 



P(F\w) 



P F {w)/ Y, P F>{ 
F'eJ 7 



w 



(4) 



We then follow the rest of Chambers and Jurafsky 



(2011 ) to score each learned frame with each MUC 



document, mapping a document to a frame if the av- 
erage Pf(w) in the document is above a threshold 
and the document contains at least one trigger word 
w' with P(F\w') > 0.2. The threshold and the in- 
duced frame were determined on the development 
set, which were then used to filter irrelevant docu- 
ments in the test set. 

Results Compared to C&J, ProFinder is con- 
ceptually much simpler, involving a single proba- 
bilistic model, with standard learning and inference 
algorithms. In particular, it did not require multi- 
ple processing steps or customized similarity met- 
rics; rather, it only used the data within MUC-4. In 
contrast, C&J required additional text to be retrieved 



from a large external corpus (Gigaword (Graff et al., 



as in Chambers and Jurafsky (201 lj . 



2005)) for each event cluster, yet ProFinder nev- 
ertheless was able to outperform C&J on entity ex- 
traction, as shown in Table [T] Our system achieved 

3 We will make the parameter settings used in all experiments 
publicly available. 



Unsupervised methods 

ProFinder (This work) 



Chambers and Jurafsky (201 1 ) 
With extra information 

ProFinder +doc. classification 
C&J 2011 +granularity 



P 
32 
48 

41 
44 



R 
37 

25 

44 

36 



F X 
34 

33 

43 

40 



Table 1: Results on MUC-4 entity extraction. C&J 2011 
+granularity refers to their experiment in which they 
mapped one of their templates to five learned clusters 
rather than one. 



Frame: 
Event: Attack 

report, participate, kid- 
nap, kill, release 
Slot: Perpetrator 
Person/Org 
Words: guerrilla, po- 
lice, source, person, 
group 

Caseframes: 

report>nsubj, 

kidnap>nsubj, 

kill>nsubj, 

participate>nsubj, 

release>nsubj 



Terrorism 

Event: Discussion 

hold, meeting, talk, dis- 
cuss, investigate 
Slot: Victim 
Person/Org 
Words: people, priest, 
leader, member, judge 

Caseframes: 

kill>dobj, 
murder>dobj, 
release>dobj, 
report>dobj, 
kidnap>dobj 



Figure 2: A partial frame learned by ProFinder from the 
MUC-4 data set, with the most probable emissions for 
each event and slot. Labels are assigned by the authors 
for readability. 



good recall but was hurt by the lower precision. We 
investigated the importance of document classifica- 
tion by only extracting from the gold-standard rele- 
vant documents (+doc. classification), which led to 
a substantial improvement in precision, suggesting 
possible further improvement by better document 
classification. Also unlike C&J, our system does not 
currently make use of coreference information. 

Figure [2] shows part of a frame that is learned by 
ProFinder, including some of the standard MUC 
slots and events. Our method also finds events not 
annotated in MUC, such as the discussion event. 
Other interesting events and slots that we noticed 
include an arrest event (call, arrest, express, meet, 
charge), a peace agreement slot (agreement, rights, 
law, proposal), and an authorities slot (police, gov- 



(a) Accidents and Natural Disasters: 

WHAT: what happened 

WHEN: date, time, other temporal markers 

WHERE: physical location 

WHY: reasons for accident/disaster 

WHO_AFFECTED: casualties... 

DAMAGES: ... caused by the disaster 

COUNTERMEASURES: rescue efforts... 

(b) (When During the night of July 17,) 
(WHAT a 23-foot <WHAT tsunami) hit the 
north coast of Papua New Guinea (PNG)>, 
(WHY triggered by a 7.0 undersea earth- 
quake in the area ). 

(c) When: night What: tsunami, coast 
WHY: earthquake 

Figure 3: An example of (a) a frame from the TAC 
Guided Summarization task with abbreviated slot de- 
scriptions, (b) an annotated TAC contributor, and (c) the 
entities that are extracted for evaluation. 

ernment, force, command). The background frame 
was able to capture many verbs related to report- 
ing, such as say, continue, add, believe, although it 
missed report. 

5 Evaluating Frame Induction Using 
Guided Summarization Templates 

One issue with the MUC-4 evaluation is the lim- 
ited variety of templates and entities that are avail- 
able. Moreover, this data set was specifically de- 
veloped for information extraction and questions re- 
main whether our approach can generalize beyond 
it. We thus conducted a novel evaluation using the 
TAC guided summarization data set, which contains 
a wide variety of frames and topics. Our evalua- 
tion corresponds to a view of summarization as ex- 
tracting structured information from the source text, 
and highlights the connection between summariza- 



tion and information extraction (Whi te et al., 2 001 ). 



Data preparation We use the TAC 2010 guided 
summarization data set for our experiments 
( Owczarzak and Dang, 2010] ). This data set pro- 
vides templates as defined by the task organizers 
and contains 46 document clusters in five domains, 
with each cluster comprising 20 documents on a 
specific topic. Eight human-written model sum- 



maries are provided for each document cluster. As 



part of the Pyramid evaluation method (Nenkova 



and Passonneau, 2004), these summaries have 



been manually segmented and labeled with slots 
from the corresponding template for each segment 
(Figure |]0 

We first considered defining the task as extract- 
ing entities from the source text, but this annotation 
is not available in TAC, and pilot studies suggested 
that it required nontrivial effort to train average users 
to conduct high-quality annotation reliably. We thus 
defined our task as extracting entities from the model 
summaries instead. As mentioned earlier, TAC slot 
annotation is available for summaries. Furthermore, 
using the summary text has the advantage that slots 
that are considered important in the domain natu- 
rally appear more frequently, whereas unimportant 
text is filtered out. 

Each span that is labeled by a slot is called a con- 
tributor. We convert the contributors into a form that 
is more like the previous MUC evaluation, so that we 
can fairly compare against previous work like C&J 
that were designed to extract information into that 
form. Specifically, we extract the head lemma from 
all the maximal noun phrases found in the contrib- 
utor. Like in MUC-4, we count a system-extracted 
noun phrase as a match if this head word matches 
and is extracted from the same document (i.e., sum- 
mary). This process can lead to noise, as the mean- 
ing of some contributors depend on a larger phrasal 
unit than a noun phrase, but this heuristic normal- 
izes the representations of the contributors so that 
they are amenable to our evaluation. We leave the 
denoising of this process to future work, and believe 
it should be feasible by crowdsourcing. 

Method and experiments The induced entity 
clusters are mapped to the TAC slots in the TAC 
frames according to the best F\ achieved for each 
TAC slot. However, one issue is that many TAC 
slots are more general than the type of slots found 
in MUC. For example, slots like Why and Coun- 
TERMEASURES likely correspond to multiple slots 
at the granularity of MUC. Thus, we map the iV-best 
induced slots to TAC slots rather than the 1-best, for 



1-to-l 

Systems P R F x 

ProFinder 24 25 24 
C&J 58 6.1 11 



5-to-l 
P R F x 
21 38 27 
50 12 20 



The full set of slots is available at http : 
|//www. nist . gov/tac/2 010/ Summarization/ 



|Guided-Summ. 2010 . guidelines ■ html 



Table 2: Results on TAC 2010 entity extraction with N- 
to-1 mapping for N = 1 and N = 5. Intermediate values 
of N produce intermediate results, and are not shown for 
brevity. 



N up to 5. We train ProFinder and a reimplemen- 
tation of C&J on the 920 full source texts of TAC 
2010, and test them on the 368 model summaries. 
We do not provide C&J's model with access to ex- 
ternal data, in order to create fair comparison con- 
ditions to our model. We also eliminate a sentence 
relevance classification step from C&J, and the doc- 
ument relevance classification step from both mod- 
els, because all sentences in the summary text are 
expected to be relevant. We tune C&J's clustering 
thresholds and the parameters to our model by two- 
fold cross validation on the summaries, and assume 
gold summary classification into the five topic cate- 
gories defined by TAC. 

Results The results on TAC are shown in Table [2] 
The overall results are poorer than for the MUC-4 
task, but this task is harder given the greater diversity 
in frames and slots to be induced. Like in the pre- 
vious evaluation, our system is able to outperform 
C&J in terms of recall and Fx, but not precision. 
C&J's method produces many small clusters, which 
makes it easy to achieve high precision. The A^-to-1 
mapping procedure can also be seen to favor their 
method over ours, many small clusters with high 
precision can be selected to greatly improve recall, 
which is indeed the case. However, ProFinder 
with 1-to-l mapping outperforms C&J even with 5- 
to-1 mapping. 

6 Conclusion 

We have presented the first probabilistic approach 
to frame induction and shown that it achieves state- 
of-the-art results on end-to-end entity extraction in 
standard MUC and TAC data sets. Our model is in- 
spired by recent advances in unsupervised seman- 
tic induction and in content modeling in summariza- 
tion, and is easy to extend. We would like to further 



investigate frame induction evaluation, for example 
to evaluate event clustering in addition to the slots 
and entities. 
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